Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification

نویسندگان

  • Yong Zhuang
  • Yuchin Juan
  • Guo-Xun Yuan
  • Chih-Jen Lin
چکیده

It is well known that a direct parallelization of sequential optimization methods (e.g., coordinate descent and stochastic gradient methods) is often not effective. The reason is that at each iteration, the number of operations may be too small. In this paper, we point out that because of the skewed distribution of non-zero values in real-world data sets, this common understanding may not be true if the method sequentially accesses data in a feature-wise manner. Because some features are much denser than others, a direct parallelization of loops in a sequential method may result in excellent speedup. This approach possesses an advantage of retaining all convergence results because the algorithm is not changed at all. We apply this idea on a coordinate descent (CD) method for L1-regularized classification, and explain why direct parallelization should work in practice. Further, an investigation on the shrinking technique commonly used to remove some features in the training process shows that this technique helps the parallelization of CD methods. Experiments indicate that a naive parallelization achieves better speedup than existing methods that laboriously modify the algorithm to achieve parallelism. Though a bit ironic, we conclude that the naive parallelization of the CD method is the best and the most robust multi-core implementation for L1-regularized classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplementary Materials for “Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”

The subproblem is in a form of L1-regularized least squares. To perform a coordinate update in the CD subroutine, the training data is accessed/used in a feature-wise manner, which is the same as how data is used in CDN for problem (3.2). Thus, the applicability of the naive parallelization to the CD subroutine should hold here and the speedup is predicted to be at a comparable level. In additi...

متن کامل

A Comparison of Optimization Methods for Large-scale L1-regularized Linear Classification

Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection, but its non-differentiability causes more difficulties in training. Various optimization methods have been proposed in recent years, but no serious comparison among them has been made. In this paper, we discuss several state of the art methods and propose two new impleme...

متن کامل

A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection; however, its non-differentiability causes more difficulties in training. Although various optimization methods have been proposed in recent years, these have not yet been compared suitably. In this paper, we first broadly review existing methods. Then, we discuss state-o...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

A distributed block coordinate descent method for training l1 regularized linear classifiers

Distributed training of l1 regularized classifiers has received great attention recently. Most existing methods approach this problem by taking steps obtained from approximating the objective by a quadratic approximation that is decoupled at the individual variable level. These methods are designed for multicore systems where communication costs are low. They are inefficient on systems such as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017